AITopics | text and image input

Collaborating Authors

text and image input

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mixed Signals: Decoding VLMs' Reasoning and Underlying Bias in Vision-Language Conflict

Pezeshkpour, Pouya, Aminnaseri, Moin, Hruschka, Estevam

arXiv.org Artificial IntelligenceApr-15-2025

Vision-language models (VLMs) have demonstrated impressive performance by effectively integrating visual and textual information to solve complex tasks. However, it is not clear how these models reason over the visual and textual data together, nor how the flow of information between modalities is structured. In this paper, we examine how VLMs reason by analyzing their biases when confronted with scenarios that present conflicting image and text cues, a common occurrence in real-world applications. To uncover the extent and nature of these biases, we build upon existing benchmarks to create five datasets containing mismatched image-text pairs, covering topics in mathematics, science, and visual descriptions. Our analysis shows that VLMs favor text in simpler queries but shift toward images as query complexity increases. This bias correlates with model scale, with the difference between the percentage of image- and text-preferred responses ranging from +56.8% (image favored) to -74.4% (text favored), depending on the task and model. In addition, we explore three mitigation strategies: simple prompt modifications, modifications that explicitly instruct models on how to handle conflicting information (akin to chain-of-thought prompting), and a task decomposition strategy that analyzes each modality separately before combining their results. Our findings indicate that the effectiveness of these strategies in identifying and mitigating bias varies significantly and is closely linked to the model's overall performance on the task and the specific modality in question.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.08974

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

OpenAI's new GPT-4 can understand both text and image inputs

EngadgetMar-14-2023, 17:23:26 GMT

Hot on the heels of Google's Workspace AI announcement Tuesday, and ahead of Thursday's Microsoft Future of Work event, OpenAI has released the latest iteration of its generative pre-trained transformer system, GPT-4. Whereas the current generation GPT-3.5, which powers OpenAI's wildly popular ChatGPT conversational bot, can only read and respond with text, the new and improved GPT-4 will be able to generate text on input images as well. "While less capable than humans in many real-world scenarios," the OpenAI team wrote Tuesday, it "exhibits human-level performance on various professional and academic benchmarks." OpenAI, which has partnered (and recently renewed its vows) with Microsoft to develop GPT's capabilities, has reportedly spent the past six months retuning and refining the system's performance based on user feedback generated from the recent ChatGPT hoopla. What's more, the new GPT has outperformed other state-of-the-art large language models (LLMs) in a variety of benchmark tests.

gpt-4, openai, text and image input, (5 more...)

Engadget

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback